In this article, a novel voice activity detection (VAD) approach based on phoneme recognition using Gaussian\r\nMixture Model based Hidden Markov Model (HMM/GMM) is proposed. Some sophisticated speech features such as\r\nhigh order statistics (HOS), harmonic structure information and Mel-frequency cepstral coefficients (MFCCs) are\r\nemployed to represent each speech/non-speech segment. The main idea of this new method is regarding the\r\nnon-speech as a new phoneme corresponding to the conventional phonemes in mandarin, and all of them are\r\nthen trained under maximum likelihood principle with Baum-Welch algorithm using GMM/HMM model. The Viterbi\r\ndecoding algorithm is finally used for searching the maximum likelihood of the observed signals. The proposed\r\nmethod shows a higher speech/non-speech detection accuracy over a wide range of SNR regimes compared with\r\nsome existing VAD methods. We also propose a different method to demonstrate that the conventional speech\r\nenhancement method only with accurate VAD is not effective enough for automatic speech recognition (ASR) at\r\nlow SNR regimes.
Loading....